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AN  APPROXIMATION  TO  THE  DISTRIBUTION  OF  THE  SAMPLE  VARIANCE 

By 

Herbert  Solomon  and  Michael  A.  Stephens 

1.  INTRODUCTION. 

In  a  recent  paper.  Tan  and  Wong  (1977)  investigated  approximations  to  the 

2  _  2 

distribution  of  the  sample  variance  s  ■  E(x^-x)  / (n-1)  when  the  observations 

are  drawn  from  parent  populations  which  are  not  normal.  In  particular,  they 

compared  an  approximation  by  Box  (1953),  using  two  moments  of  s  ,  with 

a  generalization  of  Box's  approximation  introduced  by  themselves,  and  with 

an  approximation  by  Roy  and  Tiku  (1962).  The  last  two  approximations  use 
2 

k  moments  of  s  ,  with  k  _>  4,  to  approximate  the  density  by  a  series 

2 

expansion  In  Laguerre  polynomials.  The  first  k  cumulants  of  s  and 

hence  the  first  k  moments,  for  any  parent  population  with  cumulants  up 

to  order  2k,  can  be  found  by  Fisher's  k-statistics.  Fisher  (1928)  gives 

2 

the  formulas  for  the  first  six  cumulants  of  s  in  terms  of  parent  cumu¬ 
lants;  the  first  four  moments,  both  about  the  origin  and  about  the 
mean,  were  given  by  Church  (1925).  For  order  higher  than  k  »  6,  in 
general,  the  cumulants  or  moments  would  be  very  complicated  to  calculate, 
and  present  obvious  possibilities  of  error. 

For  comparing  the  approximations,  Tan  and  Hong  used  as  parent  popula¬ 
tion  a  mixture  of  two  normal  distributions  with  the  same  variance,  so 
that  the  mixing  proportion  p  and  the  difference  A  between  the  means 

were  two  parameters  which  could  be  varied.  By  changing  these  and  the 

2 

sample  size  n  very  different  distributions  of  s  can  be  produced; 


1 


2 

for  example,  as  A  becomes  larger  the  distribution  of  s  may  become 

bimodal .  Fortunately,  for  this  parent  population.  Tan  and  Wong  could 

2 

compute  the  k-th  moment  of  s  relatively  easily  for  quite  large  k, 
and  they  also  derived  the  exact  density  of  s  .  Thus  for  this  distribu¬ 
tion  exact  probabilities  could  be  compared  with  the  approximations,  with 
k  ■  4,  6  and  10  moments  used  for  the  Tan-Wong  and  Roy-Tiku  approximations. 
Naturally  the  approximations  using  more  moments  were  much  better,  with, 
on  the  whole,  the  Roy-Tiku  approximation  better  than  the  Tan-Wong  approxi¬ 
mation  when  the  same  number  of  moments  is  used.  However,  the  Roy-Tiku 
approximation  can  give  negative  values  for  the  density,  and  Tan  and 
Wong  discuss  this  difficulty  in  some  detail. 

In  this  paper  we  give  an  approximation  which  has  yielded  good  results. 

2 

It  uses  only  three  moments  of  a  ,  is  relatively  easy  to  compute,  and 
never  gives  negative  densities. 


2 


2. 


APPROXIMATION  BY  A  GENERALIZED  CHI-SQUARE  DENSITY. 


The  approximation  now  suggested  can  be  regarded  as  a  generalization 

2  2 

of  Box's  method.  Box  approximated  the  density  of  s  by  ax^»  where 

a  and  b  are  constants  determined  by  equating  the  first  two  moments  of 

2  2  2 
s  to  those  of  aX},‘  we  now  propose  that  the  density  of  s  should  be 

k  2 

approximated  by  that  of  Y,  where  Y  *  (cW)  and  where  W  has  a  x 

distribution  with  r  degrees  of  freedom.  This  is  written  symbolically 
2  2  k 

as  s  =  (cxr)  •  The  parameters  c,  r,  and  k  are  found  by  equating  the 

2 

first  three  moments  of  s  ,  about  the  origin,  to  those  of  Y;  the  latter 

ir 

are  given  by  ji(Y)  =  <2c)K  T(k+t)/A,  y^(Y)  -  (2c)AK  T(2k+t)/A, 

lij(Y)  =  (2c)"*k  F(3k+t)/A,  where  t  *  r/2  and  A  *  T(t).  Only  the  first 
2 

three  moments  of  s  are  used  to  fit  the  approximate  density;  the  fact 
2 

that  s  has  a  density  with  zero  as  its  lower  endpoint  is  being  indirectly 
used  also.  The  approximation  Y  has  been  employed  with  success  by  the 
authors  for  other  random  variables  known  to  be  positive  (see,  e.g.  Solomon 
and  Stephens,  1977)  and  the  present  application  for  the  sample  variance 
provides  a  natural  situation  to  examine  its  usefulness  once  again. 


3 


3 


ACCURACY  OF  THE  APPROXIMATION. 


2 

Tan  and  Wong  (1977)  consider  the  distribution  of  s  for  samples 
of  size  n  from  the  mixed  normal  parent  population  with  density 

f(x)  -  p  ^(xju^c2)  +  q  (t2(x;vi2,a2)  (1) 

where  0  <  p  <  1  and  q  *  1  -  p.  In  their  Table  2,  they  give  a  comparison 

2 

of  exact  values  of  the  cumulative  distribution  of  Q  »  (n-l)s  (although 

2 

the  table  heading  refers  to  s  )  with  the  Box  approximation,  their  own 

approximation  and  the  Roy-Tiku  approximation:  for  the  latter  two  approxi- 

2 

mat ions  they  use  4,  6,  and  10  moments  for  s  .  In  Table  1  we  give  exact 

2  k 

values  E,  approximate  values  S  given  by  the  (c\r)  approximation,  and 

4 

values  W  -  10  (S-E) ,  of  the  cumulative  distribution  of  Q,  for  a  set  of 

values  of  the  parameters,  and  also  of  values  of  x,  used  by  Tan  and  Wong. 

2 

In  the  distribution  (1)  we  have  used  a  »  4,  as  did  Tan  and  Wong. 

In  Table  2,  values  of  W  are  recorded,  together  with  values  given  by  the 
Tan-Wong  approximation  (W^y)  and  those  given  by  the  Roy-Tiku  approximation 
(WRT)  when  only  4  moments  are  used;  these  have  been  taken  from  Table  2  of  Tan 
and  Wong  (1977).  It  can  be  seen  that  the  new  approximation  S  fares  well 

on  the  whole.  S  is  clearly  unlmodal,  so  cannot  follow  the  density  of 
s  when  this  tends  towards  or  becomes  bimodal  -  this  is  the  case  for 
low  p  and  large  A  for  example.  S  is  then  poor  in  the  lover  tall, 
but  nevertheless  gives  good  results  in  the  upper  tail  which  contributes 

most  to  the  three  moments  used  in  the  fit.  Pesrson  curves  were  also 

2 

fitted  to  the  s  distribution,  using  either  four  moments  or,  when 


"I 


possible,  three  moments  and  the  lower  endpoint  (for  this  technique 
see  Solomon  and  Stephens,  1978).  These  curves  are  also  unimodal  and 
give  positive  densities;  however,  sometimes  the  Pearson  curve  fit  gives 
no  lower  endpoint,  and  overall  it  provides  little  or  no  Improvement 

over  the  S  approximation.  The  latter  has  the  advantage  of  using  only 

2  2 
one  xr  distribution,  and  values  of  the  cumulative  Xr  are  nowadays 

readily  available  from  computer  routines.  Thus  probabilities  and 

2 

percentage  points  for  s  can  be  approximated  easily  once  the  fit  has 

been  made.  A  FORTRAN  program  is  available  for  this  purpose  from  the 

authors.  We  have  noticed  also  that  the  fit  will  usually  be  very  good  if 

r  is  slightly  altered,  with  the  necessary  adjustment  made  to  c  and  k 

to  match  the  first  two  moments;  thus  the  cumulative  distributions  tabulated 

for  r  at  intervals  of  0.2  in  Pearson  and  Rartley  (1972)  might  also  prove 

useful  if  r  were  adjusted  to  be  one  of  the  entries. 

A  further  advantage  of  the  proposed  approximation  is  that  it  will 

2 

be  very  easy  to  simulate  the  distribution  of  s  from  any  parent  with 
at  least  six  low-order  cumulants. 


Partially  supported  by  the  National  Science  and  Engineering  Research 
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TABLE  1 


2  2 
Q  *  (n-l)s  is  based  on  a  sample  of  size  n  from  density  (1),  with  0*4, 

and  A  =  The  table  Sives  exact  (E)  values  of  P(Q  <  X),  values 

(S)  obtained  by  the  generalized  chi-square  approximation,  and  the  difference 

W  =  104(S-E) . 


n  -  3 

r  -  2.050 

X:  0.3 

1.0 

3.0 

4.0 

8.0 

p  =  .1 

k  *  1.018 

E:  .1293 

.3963 

.7481 

.8404 

.9740 

A  =  2 

c  -  1.041 

S:  .1290 

.3694 

.7481 

.8404 

.9740 

W:  3 

1 

0 

0 

0 

n  =  3 

r  -  1.149 

X:  0.3 

2.0 

5.0 

7.0 

13.0 

p  =  .1 

k  *  0.875 

E:  .1040 

.4868 

.7597 

.8424 

.9551 

A  =  6 

c  =  4.123 

S:  .1498 

.4789 

.7457 

.8393 

.9590 

W:  458 

-79 

-140 

-31 

39 

n  -  11 

r  *  10.309 

X:  6.0 

8.0 

10.0 

16.0 

22.0 

p  =  .1 

k  -  1.023 

E:  .1469 

.3092 

.4859 

.8547 

.9715 

A  -  2 

c  -  0.999 

S:  .1470 

.3092 

.4859 

.8546 

.9715 

W:  1 

0 

0 

-1 

0 

n  *  11 

r  «  3.930 

X:  8.0 

11.0 

17.0 

26.0 

38.0 

p  =  .1 

k  -  0.754 

E:  .1384 

.2701 

.5245 

.8018 

.9594 

r 

A  •  6 

c  -  12.548 

S:  .1379 

.2579 

.5189 

.8066 

.9598 

W:  -5 

-122 

-56 

48 

4 

n  -  11 

r  -  0.833 

X:  5.0 

11.0 

35.0 

53.0 

77.0 

p  *  ,1 

k  -  0.370 

E:  .0342 

.2035 

.5759 

.8259 

.9697 

A  *  10 

c  »  30125 

S:  .0705 

.1709 

.5882 

.8239 

.9688 

W:  363 

-326 

123 

-20 

-9 

n  ■  3 

r  -  1.896 

X:  0.1 

1.0 

3.0 

4.0 

8.0 

p  ■  .4 

k  -  0.962 

E:  .0390 

.3291 

.7000 

.8001 

.9613 

A  -  2 

c  -  1.379 

S:  .0398 

.3292 

.6998 

.8000 

.9614 

U:  8 

1 

-2 

-1 

1 

n  *  11 

r  -  9.392 

X:  7.0 

11.0 

15.0 

19.0 

23.0 

P  *  .4 

k  -  0.954 

E:  .1518 

.4522 

.5994 

.8818 

.9559 

r 

A  -  2 

c  -  1.498 

S:  .1518 

.4521 

.5994 

.8818 

.9559 

W:  0 

-1 

0 

0 

0 

n  *  11 

r  -  6.203 

X:  27.0 

33.0 

39.0 

45.0 

50.0 

p  ■  ,4 

k  -  0.609 

E:  .3638 

.5818 

.7631 

.8828 

.9406 

A  -  6 

c  -  49.80 

S:  .3667 

.5820 

.7611 

.8812 

.9400 

W:  29 

2 

-20 

-16 

-6 

n  ■  11 

r  -  2.829 

X:  47.0 

71.0 

83.0 

89.0 

107.0 

p  ■  .4 

k  -  0.309 

E:  .1127 

.5258 

.7579 

.8457 

.9797 

A  ■  10 

c  -  432081 

S:  .1236 

.5207 

.7433 

.8311 

.9710 

W:  109 

-51 

-146 

-146 

—87 

TABLE  2 


Values  of  W  (see  Table  1)  given  by  the  new  approximation,  compared  with 
values  W^j  and  WRT  given  by  the  Tan-Wong  and  the  Roy-Tiku  approximations 
using  only  four  moments. 


n  -  3 

p  =  .1 

A  *  2 


W. 


RT" 


W 


RT' 


n  -  11 

p  -  .1 

A  -  6 


W. 


RT' 


n  -  11 
p  -  .1 

A  -  10 


X: 

W: 


0.3 

1.0 

3.0 

4.0 

8.0 

3 

1 

0 

0 

0 

9 

0 

-4 

0 

1 

0 

0 

0 

0 

0 

0.3 

2.0 

5.0 

7.0 

13.0 

458 

-79 

-140 

-31 

39 

311 

-82 

-89 

1 

30 

-11 

-7 

14 

7 

-5 

6.0 

8.0 

10.0 

16.0 

22.0 

1 

0 

0 

-1 

0 

-1 

-2 

-1 

1 

0 

0 

0 

0 

0 

0 

8.0 

11.0 

17.0 

26.0 

38.0 

-5 

-122 

-56 

48 

4 

-25 

-90 

-3 

36 

-11 

-40 

55 

63 

-65 

15 

5.0 

11.0 

35.0 

53.0 

77.0 

363 

-326 

123 

-20 

-9 

65 

-418 

249 

-110 

-75 

-246 

-609 

-331 

-217 

129 

0.1 

1.0 

3.0 

4.0 

8.0 

8 

1 

-2 

-1 

1 

-14 

-2 

9 

3 

-3 

0 

0 

1 

0 

0 
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0*0 


TABLE  2  (Continued) 


X: 

7.0 

11.0 

15.0 

19.0 

23.0 

n  *  11 

W: 

0 

-1 

0 

0 

0 

p  =  .4 

A  =  2 

W„,: 

TW 

2 

3 

-2 

-2 

0 

w  * 
RT* 

0 

0 

0 

0 

0 

X: 

27.0 

33.0 

39.0 

45.0 

50.0 

n  =  11 

W: 

29 

2 

-20 

-16 

6 

p  *  .4 

A  =  (L 

WTW: 

33 

-40 

-51 

-18 

7 

Cl  =  0 

WRT: 

114 

29 

-61 

-82 

-56 

X: 

47.0 

71.0 

83.0 

89.0 

107.0 

n  = 

11 

W: 

109 

-51 

-146 

-146 

-87 

P  = 
A  = 

.4 

10 

WTWS 

WRT: 

240 

262 

-216 

191 

-182 

-274 

-119 

-416 

-24 

-342 
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